This manuscript describes a technique for computing partial rank-revealingfactorizations, such as, e.g, a partial QR factorization or a partial singularvalue decomposition. The method takes as input a tolerance $\varepsilon$ and an$m\times n$ matrix $A$, and returns an approximate low rank factorization of$A$ that is accurate to within precision $\varepsilon$ in the Frobenius norm(or some other easily computed norm). The rank $k$ of the computedfactorization (which is an output of the algorithm) is in all examples weexamined very close to the theoretically optimal $\varepsilon$-rank. Theproposed method is inspired by the Gram-Schmidt algorithm, and has the same$O(mnk)$ asymptotic flop count. However, the method relies on randomizedsampling to avoid column pivoting, which allows it to be blocked, and henceaccelerates practical computations by reducing communication. Numericalexperiments demonstrate that the accuracy of the scheme is for every matrixthat was tried at least as good as column-pivoted QR, and is sometimes muchbetter. Computational speed is also improved substantially, in particular onGPU architectures.
展开▼
机译:该手稿描述了一种用于计算部分等级揭示因子分解的技术,例如部分QR因子分解或部分奇异值分解。该方法将公差$ \ varepsilon $和$ m \ times n $矩阵$ A $作为输入,并返回近似的低阶因式分解$ A $,该精度在Frobenius范数的精度$ \ varepsilon $内(或其他易于计算的规范)。在所有示例中,我们检查了计算因式分解(它是算法的输出)的等级$ k $,非常接近理论上最佳的$ \ varepsilon $-等级。所提出的方法是受Gram-Schmidt算法启发的,并且具有相同的$ O(mnk)$渐近翻牌计数。但是,该方法依赖于随机采样来避免列旋转,从而使列被阻塞,从而通过减少通信来加速实际计算。数值实验表明,该方案的精度对于尝试至少与以列为中心的QR一样好的每个矩阵,有时要好得多。计算速度也大大提高,尤其是在GPU架构上。
展开▼